logo

Work done by the research group of DANS, its members and former members.

Home

Labs (all on GitHub)

Click on a lab name to view a description, click on a logo to go to the lab's repo.

Maintained on GitHub by Dirk Roorda

Text Fabric

(edit) source

About

Processing library for handling texts and annotations. Think of the Hebrew Bible with 4 MB of text and 33 million annotations.

It also defines a minimalistic model for such data, realised in a text-based file format, .tf.

Tech

This is a Python package, with a lot of attention to performance. It contains quite a few loops that will frequently iterate millions of times, so we have carefully used the parts of Python that perform well.

Docs

The model, file format and API are documented in its wiki.

Status

The library is in full productive use at the ETCBC for their flagship dataset Biblia Hebraica Stuttgartensia Amstelodamensis. See also the pipeline which transports data at the institute to its website shebanq.

Author

Dirk Roorda